Goto

Collaborating Authors

 temporal interval


02e978a2cc9a1d0d4376a7deb01db612-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

In Figures 2 and 3, we provide examples of simulated and real satellite image sequences of wildfire. We implement the S2R-FireTr model for wildfire forecasting and backtracking using PyTorch. We set the batch size to 4. Each training sequence contains six frames; each resized to 256 We train S2R-FireTr for ten epochs. In Table 1, we present the performance of S2R-FireTr, which is trained with different temporal intervals. Satellite orbits around the Earth typically involve relatively large temporal intervals, and this aligns with the training data.


TimeScope: Towards Task-Oriented Temporal Grounding In Long Videos

Liu, Xiangrui, Qin, Minghao, Shu, Yan, Liang, Zhengyang, Tian, Yang, Zhang, Chen Jason, Zhao, Bo, Liu, Zheng

arXiv.org Artificial Intelligence

Identifying key temporal intervals within long videos, known as temporal grounding (TG), is important to video understanding and reasoning tasks. In this paper, we introduce a new form of the temporal grounding problem, \textbf{Task-oriented Temporal Grounding} (\textbf{ToTG}), which is driven by the requirements of downstream tasks rather than explicit time-interval descriptions. For example, a ToTG input may be "explain why the man in the video is sent to the hospital," whereas traditional TG would take an explicit temporal description such as "the moments when the man is tripped by a stone and falls to the ground." This new ToTG formulation presents significant challenges for existing TG methods, as it requires jointly performing deep task comprehension and fine-grained temporal localization within long videos. To address these challenges, we conduct a systematic set of studies. First, we construct \textbf{a new benchmark ToTG-Bench}, which comprehensively evaluates ToTG performance across diverse settings. Second, we introduce \textbf{a new temporal-ground method TimeScope}, which performs coarse-to-fine localization through a progressive reasoning process. Leveraging extensive supervised fine-tuning with carefully curated chain-of-thought (CoT) data from a variety of scenarios, TimeScope generalizes effectively across tasks and domains. Our evaluation demonstrates \textbf{TimeScope's empirical advantages} over existing baselines from three perspectives: (1) substantial improvements in grounding precision, (2) significant benefits to downstream tasks, and (3) strong generalizability across different scenarios. All models, datasets, and source code will be fully open-sourced to support future research in this area.


RAVEN: Robust Advertisement Video Violation Temporal Grounding via Reinforcement Reasoning

Ji, Deyi, Yang, Yuekui, Wu, Haiyang, Ma, Shaoping, Chen, Tianrun, Zhu, Lanyun

arXiv.org Artificial Intelligence

Advertisement (Ad) video violation detection is critical for ensuring platform compliance, but existing methods struggle with precise temporal grounding, noisy annotations, and limited generalization. We propose RAVEN, a novel framework that integrates curriculum reinforcement learning with multimodal large language models (MLLMs) to enhance reasoning and cognitive capabilities for violation detection. RAVEN employs a progressive training strategy, combining precisely and coarsely annotated data, and leverages Group Relative Policy Optimization (GRPO) to develop emergent reasoning abilities without explicit reasoning annotations. Multiple hierarchical sophisticated reward mechanism ensures precise temporal grounding and consistent category prediction. Experiments on industrial datasets and public benchmarks show that RAVEN achieves superior performances in violation category accuracy and temporal interval localization. We also design a pipeline to deploy the RAVEN on the online Ad services, and online A/B testing further validates its practical applicability, with significant improvements in precision and recall. RAVEN also demonstrates strong generalization, mitigating the catastrophic forgetting issue associated with supervised fine-tuning.


02e978a2cc9a1d0d4376a7deb01db612-Supplemental-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing Systems

In Figures 2 and 3, we provide examples of simulated and real satellite image sequences of wildfire. We implement the S2R-FireTr model for wildfire forecasting and backtracking using PyTorch. We set the batch size to 4. Each training sequence contains six frames; each resized to 256 We train S2R-FireTr for ten epochs. In Table 1, we present the performance of S2R-FireTr, which is trained with different temporal intervals. Satellite orbits around the Earth typically involve relatively large temporal intervals, and this aligns with the training data.


Iterative Zoom-In: Temporal Interval Exploration for Long Video Understanding

Li, Chenglin, Chen, Qianglong, fengtao, null, Zhang, Yin

arXiv.org Artificial Intelligence

Multimodal Large Language Models (MLLMs) have shown strong performance in video understanding tasks. However, they continue to struggle with long-form videos because of an inefficient perception of temporal intervals. Unlike humans, who can dynamically adjust their temporal focus to locate query-relevant moments, current MLLMs often rely on dense, uniform sampling across the video timeline, leading to high memory consumption and a risk of missing crucial information. To address this challenge, we introduce Temporal Search, a training-free framework that enables MLLMs to explore temporal regions for improved long video understanding iteratively. TS is based on a key observation: the model's generation confidence across different temporal intervals is highly correlated with prediction accuracy. TS operates through two main iterative stages. First, the MLLM proposes a temporal interval that is likely to contain task-relevant information. Then, it samples a fixed number of frames from the interval, regardless of length, and feeds them into the model to produce a refined response and confidence score. TS refines the focus of the model by iteratively shifting attention to more fine-grained temporal intervals, improving its understanding of long videos. Additionally, keyframe-level descriptions are collected to facilitate cross-interval perception throughout the video. To further improve efficiency, we introduce TS-BFS, a best-first search strategy over a tree. Each node represents a candidate interval and is expanded via two methods: self-driven proposals and uniform partitioning. Nodes are scored based on confidence and self-evaluation, and the most promising one is selected for continued exploration.


ATI-CTLO:Adaptive Temporal Interval-based Continuous-Time LiDAR-Only Odometry

Zhou, Bo, Wu, Jiajie, Pan, Yan, Lu, Chuanzhao

arXiv.org Artificial Intelligence

The motion distortion in LiDAR scans caused by aggressive robot motion and varying terrain features significantly impacts the positioning and mapping performance of 3D LiDAR odometry. Existing distortion correction solutions often struggle to balance computational complexity and accuracy. In this work, we propose an Adaptive Temporal Interval-based Continuous-Time LiDAR-only Odometry, utilizing straightforward and efficient linear interpolation. Our method flexibly adjusts the temporal intervals between control nodes according to the dynamics of motion and environmental characteristics. This adaptability enhances performance across various motion states and improves robustness in challenging, feature-sparse environments. We validate the effectiveness of our method on multiple datasets across different platforms, achieving accuracy comparable to state-of-the-art LiDAR-only odometry methods. Notably, in scenarios involving aggressive motion and sparse features, our method outperforms existing solutions.


The Common Core Ontologies

Jensen, Mark, De Colle, Giacomo, Kindya, Sean, More, Cameron, Cox, Alexander P., Beverley, John

arXiv.org Artificial Intelligence

The Common Core Ontologies (CCO) are designed as a mid-level ontology suite that extends the Basic Formal Ontology. CCO has since been increasingly adopted by a broad group of users and applications and is proposed as the first standard mid-level ontology. Despite these successes, documentation of the contents and design patterns of the CCO has been comparatively minimal. This paper is a step toward providing enhanced documentation for the mid-level ontology suite through a discussion of the contents of the eleven ontologies that collectively comprise the Common Core Ontology suite.


Configuration Planning with Temporal Constraints

Köckemann, Uwe (Örebro University) | Karlsson, Lars (Örebro University)

AAAI Conferences

Configuration planning is a form of task planning that takes into consideration both causal and information dependencies in goal achievement. This type of planning is interesting, for instance, in smart home environments which contain various sensors and robots to provide services to the inhabitants. Requests for information, for instance from an activity recognition system, should cause the smart home to configure itself in such a way that all requested information will be provided when it is needed. This paper addresses temporal configuration planning in which information availability and goals are linked to temporal intervals which are subject to constrains. Our solutions are based on constraint-based planning which uses different types of constraints to model different types of knowledge. We propose and compare two approaches to configuration planning. The first one models information via conditions and effects of planning operators and essentially reduces configuration planning to constraint-based temporal planning. The second approach solves information dependencies separately from task planning and optimizes the cost of reaching individual information goals. We compare these approaches in terms of the time it takes to solve problems and the quality of the solutions they provide.


Data-Driven Fashion Design Stitch Fix Technology – Multithreaded

#artificialintelligence

A core methodology at Stitch Fix is blending recommendations from machines with judgments of expert humans. Our machines produce recommendations via algorithms operating over structured data, while our human stylists curate and modify these recommendations on the basis of unstructured data and knowledge that isn't yet reflected in our dataset (e.g., new fashion trends). This helps us choose the best 5 items to offer each client in each fix. The success of this strategy within our styling organization prompts consideration of how machines and humans might be brought together in the realm of fashion design. In this post we describe one implementation of such a system.


A Temporal Bayesian Network for Diagnosis and Prediction

Arroyo-Figueroa, Gustavo, Sucar, Luis Enrique

arXiv.org Artificial Intelligence

Diagnosis and prediction in some domains, like medical and industrial diagnosis, require a representation that combines uncertainty management and temporal reasoning. Based on the fact that in many cases there are few state changes in the temporal range of interest, we propose a novel representation called Temporal Nodes Bayesian Networks (TNBN). In a TNBN each node represents an event or state change of a variable, and an arc corresponds to a causal-temporal relationship. The temporal intervals can differ in number and size for each temporal node, so this allows multiple granularity. Our approach is contrasted with a dynamic Bayesian network for a simple medical example. An empirical evaluation is presented for a more complex problem, a subsystem of a fossil power plant, in which this approach is used for fault diagnosis and prediction with good results.